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Abstract 

Twenty  years  of  AI  research  in  knowledge  representa¬ 
tion  has  produced  frame  knowledge  representation  systems 
(FRSs)  that  incorporate  a  number  of  important  advances. 
However,  FRSs  lack  two  important  capabilities  that  prevent 
them  from  scaling  up  to  realistic  applications:  they  cannot 
provide  high-speed  access  to  large  knowledge  bases  (KBs), 
and  they  do  not  support  shared,  concurrent  KB  access  by 
multiple  users.  Our  research  investigates  the  hypothesis  that 
one  can  employ  an  existing  database  management  system 
(DBMS)  as  a  storage  subsystem  for  an  FRS,  to  provide 
high-speed  access  to  large,  shared  KBs.  We  describe  the 
design  and  implementation  of  a  general  storage  system  that 
incrementally  loads  referenced  frames  from  a  DBMS,  and 
saves  modified  frames  back  to  the  DBMS,  for  two  different 
FRSs:  Loom  and  Theo.  We  Mso  present  e.xperimental  re¬ 
sults  showing  that  the  performance  of  our  prototype  storage 
subsystem  exceeds  that  of  flat  files  for  simulated  applications 
that  reference  or  update  up  to  one  third  of  the  frames  from 
a  large  Loom  KB. 


To  appear  in  CIKM-94  (Conference  on  Information  and 
Knowledge  Management),  Gaithersburg  MD,  1994. 


1  Introduction 

Twenty  years  of  AI  research  in  knowledge  representation  has 
produced  frame  knowledge  representation  systems  (FRSs) 
that  incorporate  a  number  of  important  advances  (6,  4j. 
FRSs  provide  inference  capabilities  such  as  production  rules 
and  classification  to  derive  the  deductive  consequences  ofe.x- 
plicit  information.  Defeasible  inheritance  Mlows  regularities 
(defaults)  to  be  encoded  and  overridden  for  many  objects 
ivith  minimal  effort.  And  run-time  schema  alteration  capa¬ 
bilities  support  the  evolution  of  complex  knowledge  bases 
(KBs). 

However,  FRSs  lack  two  important  capabilities  that  pre¬ 
vent  them  from  scaling  up  to  realistic  applications:  they 
cannot  provide  high-speed  access  to  large  KBs,  and  they  do 
not  support  shared,  concurrent  KB  access  by  multiple  users. 
All  e.xisting  FRSs  process  their  KBs  in  data  structures  that 
exist  entirely  in  virtual  memory,  forcing  users  to  read  the 
whole  KB  into  memory  from  disk  before  its  use.  To  provide 
persistence,  KBs  are  written  to  disk  files  in  their  entirety. 
Saving  or  loading  a  KB  can  therefore  become  an  expensive 
operation,  taking  time  proportional  to  the  size  of  the  KB. 
■An  effective  cap  is  placed  on  the  size  of  a  KB  by  the  amount 
of  time  that  users  are  willing  to  wait  for  save  and  load  op¬ 
erations,  with  an  absolute  cap  based  on  the  size  of  virtual 
memory. 

A  more  favorable  arrangement  would  be  one  in  which 
load  time  and  memory  usage  are  proportional  to  the  num¬ 
ber  of  frames  referenced,  and  save  time  is  proportional  to  the 
number  of  frames  updated.  This  is  the  behavior  supported 
by  convention^  database  systems,  which  also  offer  other 
important  storage  management  facilities,  including  trans¬ 
actions,  error  recovery,  and  concurrent  access.  VVe  combine 
the  information  management  capabilities  of  frame  represen¬ 
tation  systems  with  the  storage  management  capabilities  of 
conventional  database  systems  to  form  a  single  intelligent, 
persistent,  and  scalable  information  management  system. 

Our  research  investigates  the  hypothesis  that  we  can  em¬ 
ploy  an  existing  database  management  system  (DBMS)  as 
a  storage  subsystem  for  an  FRS,  to  provide  high-speed  ac¬ 
cess  to  large,  shared  KBs.  This  paper  discusses  the  de¬ 
sign  requirements  that  we  identified  for  this  storage  sub¬ 
system,  presents  alternative  storage-subsystem  architectures 
that  satisfy  those  requirements,  and  gives  performance  mea¬ 
surements  from  our  prototype  implementation. 

Our  prototype  storage  subsystem  utilizes  a  commercial 
relational  DBMS  (RDBMS).  To  gain  a  fuller  understanding 
of  the  issues  involved  in  developing  a  storage  system  for  an 
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Figure  1;  A  characterization  by  McKay  et  al.  of  alternative 
strategies  for  coupling  an  AI  system  such  as  an  FRS  with  a 
database  system. 


faulting  in  operating  systems.  Our  current  implementation 
uses  this  approach,  but  we  are  also  studying  alternatives 
such  as  transferring  only  a  piece  of  a  referenced  frame  (e.g., 
a  single  slot),  or  transferring  a  cluster  of  related  frames  (e-g., 
a  referenced  frame  plus  all  frames  that  it  references). 

In  our  current  implementation,  all  modified  frames  are 
transferred  from  the  FRS  to  the  DBMS  when  the  user  per¬ 
forms  a  KB-save  operation. 

3  Storage  Subsystem  Implementation 

The  storage  subsystem  transmits  ASCII  encodings  of  LOOM 
and  Theo  frames  between  the  DBMS,  and  the  FRS.  We 
first  provide  an  overview  of  the  frame  structures  that  LOOM 
and  Theo  employ.  We  then  discuss  the  architecture  of  the 
storage  subsystem,  and  modifications  we  made  to  LOOM  and 
Theo  to  interface  them  to  the  storage  subsystem. 


FRS,  we  have  integrated  this  storage  subsystem  with  two 
FRSs;  Theo  [ll]  and  LOOM  [5,  8].  LOOM  is  in  the  Kl-ONE 
family  of  FRSs,  whereas  Theo  is  in  the  RLL  family;  differ¬ 
ences  in  the  philosophies  and  implementations  of  TheO  and 
Loom  affect  their  exact  requirements  for  a  storage  subsys¬ 
tem. 

2  Storage  Subsystern  Architecture 

McKay  et  al.  describe  four  broad  .alternative  architectures 
for  coupling  AI  systems  (FRSs)  with  database  systems  (see 
Figure  1)  [10].  Our  primary  goal  is  to  provide  a  storage  sys¬ 
tem  for  an  FRS  with  minimal  disruption  to  the  end  user  of 
the  FRS.  We  therefore  chose  strategy  (a)  in  Figure  1,  namely 
to  submerge  a  DBMS  within  an  FRS  such  that  the  presence 
of  the  DBMS  is  invisible  to  the  end  user.  For  e.xainple,  the 
user  need  not  have  any  knowledge  of  the  DBMS  schema, 
nor  must  he  establish  a  mapping  between  the  schemas  in 
the  DBMS  and  the  FRS.  The  Intelligent  Database  Interface 
(IDI)  system  developed  by  McKay  et  al.  uses  strategy  (d) 
because  their  main  objective  is  to  import  information  from 
an  e.xisting  DBMS  into  a  knowledge  representation  system. 

Once  a  general  architecture  is  selected,  several  more  choices 
must  be  made,  such  as,  what  type  of  DBMS  is  best  suited 
to  the  role  of  a  frame  storage  system?  Because  the  answer 
to  this  question  is  not  apparent,  we  are  experimenting  with 
a  commercial  relational  DBMS,  and  an  extensible  storage 
management  system  called  EXODUS  from  the  University  of 
Wisconsin  [3].  This  paper  presents  performance  results  for 
the  relational  DBMS. 

Another  decision  concerns  the  manner  in  which  FRS  in¬ 
formation  is  organized  in  the  DBMS.  One  of  our  goals  is 
that  the  user  should  not  have  to  design  a  DBMS  schema 
for  every  new  KB.  Instead,  we  as  designers  of  the  storage 
system  must  create  a  generic  DBMS  schema  that  accom¬ 
modates  all  potential  FRS  information.  In  fact,  more  than 
one  such  generic  schema  exists,  and  we  plan  to  evaluate  the 
performance  of  several  schemas  empirically. 

Another  series  of  choices  concerns  tlie  granularity  at  which 
information  is  transferred  between  the  DBMS  and  the  FRS. 
Our  goals  are  for  KB  loading  to  take  time  proportional  to 
the  amount  of  information  the  application  actually  refer¬ 
ences;  KB  saving  should  take  time  proportional  to  the  num¬ 
ber  of  frames  updated  in  the  KB.  The  simplest  mechanism 
that  satisfies  these  constraints  is  to  transfer  a  single  frame 
from  the  DBMS  to  the  FRS  when  the  user  application  refer¬ 
ences  a  frame  that,  is  not  currently  in  virtual  memory.  This 
demand-loading  approach  is  analogou.s  to  the  use  of  page 


3.1  LOOM  Structures  and  Operation 

A  Loom  KB  contains  three  types  of  frames:  concepts,  in¬ 
stances,  and  relations  (we  have  simplified  the  description 
of  Loom  for  expository  purposes).  A  concept  consists  of 
a  name  and  a  definition.  The  concept  definition  is  a  set 
of  necessary  and  sufficient  conditions  that  an  instance  must 
meet  in  order  to  be  an  instance  of  the  concept.  The  defini¬ 
tion  is  a  list  of  zero  or  more  super-concepts,  constraints  on 
slot  values,  predicates,  and  other  types  of  constraints  and 
characteristics.  Given  this  information,  the  LOOM  classifier 
arranges  all  concepts  into  a  subsumption  (generalization) 
hierarchy. 

A  Loo.M  relation  (not  to  be  confused  with  the  usual 
database  definition  of  a  relation  as  a  table)  is  a  KB-wde 
definition  of  the  properties  of  a  slot.  We  can  define  a  do¬ 
main  and  a  range  for  a  relation.  The  domain  indicates  all 
concepts  whose  instances  can  have  values  for  the  relation, 
and  the  range  limits  the  types  of  objects  that  can  serve  as 
values. 

Instances  have  one  or  more  parent  concepts  and  some 
set  of  slot  (attribute)  vaJues.  Based  on  these  characteris¬ 
tics,  the  Loom  classifier  can  infer  the  concepts  to  which 
the  instance  belongs.  For  e.xample,  if  Person  is  a  primitive 
concept  that  is  in  the  domain  of  the  relations  Sex  and  Age, 
and  if  a  Female-person  is  a  Person  with  Sex=Female,  and  a 
Girl  is  a  Person  with  Sex=Female  and  Age<18,  then  LOOM 
will  correctly  infer  that  Girl  belongs  below  Female-person 
in  the  concept  hierarchy,  if  Sally  is  an  instance  of  Person 
with  Sex=:Female  and  Age=17,  then  Loom  will  correctly 
infer  that  Sally  is  an  instance  of  Girl.  If  Sally  has  a  birth¬ 
day  and  her  age  changes  to  18,  then  LOOM  will  revise  that 
classification  automatically. 

Most  commonly,  LOOM  performs  two  types  of  inferences; 
in  backward-chaining  mode  values  are  computed  only  when 
requested,  and  in  forward-chaining  mode  the  consequences 
of  an  assertion  are  computed  as  soon  as  the  assertion  is 
mode.  All  our  tests  and  experiments  liave  used  Loom’s 
backward-chaining  mode.  We  believe  our  system  would  also 
work  with  the  forward-chaining  mode.  However,  in  order  to 
make  the  required  inferences,  creation  or  modification  of  a 
single  frame  could  trigger  a  large  number  of  frame  faults  by 
Loom’s  classifier,  which  could  hurt  performance.  We  h<ave 
not  attempted  to  support  LooM’s  production-rule  inference. 


3.2  THEO  Structures  and  Operation 

Because  Theo  is  also  an  FRS,  it  shares  many  characteristics 
with  Loom.  Theo  frames  are  also  arranged  in  a  generaliza¬ 
tion  hierarchy,  and  Theo  frames  consist  of  slots  that  con¬ 
tain  values.  However,  Theo  classes  do  not  have  associated 
definitions,  and  Theo  does  not  compute  the  classification 
operation.  Given  the  basic  structural  similarity  of  LOOM 
and  Theo,  it  is  natural  to  develop  a  storage  system  that 
can  serve  both  systems. 

For  simplicity  the  remainder  of  the  paper  usually  men¬ 
tions  Loom  only.  All  statements  we  make  about  the  inter¬ 
action  of  Loom  with  our  storage  system  also  apply  to  Theo 
e.'tcept  wliere  we  state  otherwise. 

3.3  Relational  Schema 

The  relational  DBMS  schema  we  employ  to  store  Loom  KBs 
consists  of  five  relational  tables.  An  example  is  shown  in 
Figure  2. 

The  Frames  table  contains  frame  definitions.  A  frame 
definition  is  a  string  of  te.xt  that  provides  LOOM  with  all 
the  information  necessary  to  create  the  frame.  We  place 
concept  and  instance  definitions  together  in  the  same  ta¬ 
ble,  because  there  are  occasions  when  a  frame  is  referenced 
without  its  type  being  known  —  the  type  field  then  iden¬ 
tifies  the  frame  as  a  concept  or  instance.  Most  definitions 
will  be  relatively  short,  but  some  may  be  quite  long.  For 
this  reason,  a  sequence  number  is  included,  in  case  a  def¬ 
inition  exceeds  the  DBMS  maximum  column  size  and  has 
to  be  split  into  multiple  tuples.  We  record  the  number  of 
parents  of  each  frame  to  enable  the  storage  subsystem  to 
perform  certain  optimizations.  A  KB  identifier  is  included 
in  each  table,  to  enable  multiple  KBs  to  be  stored  in  one 
DBMS.  The  KB  Mapping  table  associates  a  KB  name  with 
its  unique  identifier. 

The  tables  5»pers  and  Instance  Classes  enable  recon¬ 
struction  of  the  concept  and  instance  hierarchy  outside  of 
Loo.M.  The  former  lists  the  super-sub  relationships  between 
concepts;  the  latter  documents  the  relationship  between  in¬ 
stances  and  their  parent  concepts.  Separate  indices  are  built 
to  retrieve  the  subconcepts  of  a  concept,  the  superconcepts 
of  a  concept,  the  instances  of  a  concept,  and  the  parent  con¬ 
cepts  of  an  instance.  This  information  is  necessary  for  two 
reasons.  First,  in  order  for  a  concept  or  instance  to  be  de¬ 
fined  in  Loom,  all  parent  concepts  must  already  be  loaded, 
or  Loom  will  not  be  able  to  classify  the  new  frame.  Thus,  we 
must  be  able  to  determine  the  concepts  from  which  a  given 
concept  or  instance  inherits.  Second,  the  definition  does 
not  contain  information  about  subconcepts  or  instances  of 
a  concept,  so  we  must  provide  that  information  to  LOOM 
directly,  outside  the  normal  channels. 

3.4  Frame  Faulting 

A  Jrame  fault  occurs  when  an  application  (or  LOOM  itself) 
references  a  frame  F  that  is  not  in  virtual  memory.  Exam¬ 
ples  of  frame  references  include  retrieving  or  altering  slot 
values  of  F,  and  requesting  a  list  of  the  parents  of  F.  When 
faulting  a  frame  into  memory,  we  retrieve  its  definition  from 
the  DBMS  by  issuing  one  or  more  SQL  queries.^  This  ap¬ 
proach  allows  multiple  users  to  access  and  to  update  the 

^  VVe  call  on  the  IDI  from  Paramax  to  communicate  with  the 
RDBMS  server  from  LISP  using  SQL  queries  that  can  be  transported 
over  a  network.  We  are  not  employing  the  full  power  of  the  IDI;  we 
utilize  only  the  module  of  the  IDI  that  formulates  aud  unpacks  SQL 
queries. 


same  KB  from  the  R.DBMS  server  in  a  distributed  (but 
uncoordinated)  fashion.  Our  future  work  will  investigate 
methods  of  controlling  multiple  updates  to  a  shared  KB. 

We  then  call  standard  Loom  functions  to  add  the  frame 
to  its  Loom  KB.  This  process  is  complicated  by  the  fact 
that  most  frames  are  related  (connected)  to  other  frames  in 
the  KB.  For  example,  a  concept  is  related  to  its  supercon¬ 
cepts,  snbconcepts,  and  instances.  .'Vn  instance  will  contain 
references  to  its  parent  concepts.  In  addition,  an  instance 
may  contain  references  to  other  instances  serving  as  fillers 
of  the  instance’s  slots.  Loom  normally  expects  all  of  these 
other  frames  to  be  present  in  memory.  When  faulting  a 
frame  into  memory,  we  also  give  LOOM  just  enough  of  the 
conte.xt  required  to  process  the  faulted  frame. 

The  process  of  faulting  F  into  memory  involves  three 
steps;  processing  the  parents  of  F,  informing  LOOM  of  the 
definition  of  F,  and  processing  connections  from  F  to  frames 
other  than  its  parents. 

Connections  to  parent  frames  We  treat  connections 
to  parent  frames  differently  than  connections  to  subcon¬ 
cepts,  instances,  and  slot-value  references.  The  RDBMS 
tables  Supers  and  Instance  Classes  lecoid  the  direct  parents 
of  every  frame  (as  inferred  by  LOOM  by  classification  before 
the  KB  was  last  saved).  When  processing  a  fault  to  frame  F, 
we  first  generate  faults  to  every  direct  parent  of  F  that  is  not 
currently  in  virtual  memory  (faults  to  the  parents  of  these 
parents  may  then  be  generated  recursively).  Therefore,  all 
parents  of  F  are  loaded  before  F  is  defined. 

Connections  to  other  frames  Loo.M  implements  all 
frame  references  as  LISP  pointers  to  the  actual  LOOM  data 
structures  for  the  frames  in  question.  In  the  RDBMS  defini¬ 
tion  of  the  frame,  these  references  are  symbolic  frame  names. 
Normally,  when  Loom  processes  a  frame  definition  it  inter¬ 
nally  converts  the  names  to  pointers  to  the  actual  objects. 
However,  consider  the  situation  where  a  slot  of  F  references 
a  frame  G,  and  G  has  not  yet  been  loaded  from  the  RDBMS. 
There  would  be  no  LOOM  data  structure  to  which  F  could 
point.  Although  we  could  now  fault  in  G,  we  wish  to  avoid 
loading  a  frame  just  because  we  need  to  point  to  it,  because 
for  some  KBs  this  strategy  could  recursively  fault  in  the 
entire  KB.  Instead,  we  create  a  stub  for  G  —  a  dummy  ob¬ 
ject  with  a  name  but  containing  no  information  —  to  serve 
as  a  place-holder  for  G.  LOOM  can  store  and  pass  around 
pointers  to  a  stub  just  as  it  would  a  pointer  to  any  frame. 
A  pointer  to  G  is  then  manually  inserted  into  the  appro¬ 
priate  slot  of  F.  A  future  attempt  to  retries'e  information 
(other  than  the  name)  from,  or  to  write  information  to  a 
stub,  causes  a  trap  to  the  storage  subsystem,  and  the  actual 
frame  is  then  faulted  in.  Loo.M  inferencing  is  not  affected 
by  the  presence  of  stubs,  as  any  attempt  to  reason  using  a 
stub  will  result  in  a  frame  fault.  This  mechanism  is  analo¬ 
gous  to  theswizzling  operation  performed  in  object-oriented 
database  management  systems  (OODBMSs),  which  convert 
an  object  ID  into  a  pointer  [7].  Object  IDs  are  typically 
numbers;  symbolic  frame  names  in  LOOM  are  analogous  to 
object  IDs. 

This  topic  is  one  of  the  more  significant  differences  be¬ 
tween  Loom  and  Theo.  In  Theo,  connections  from  one 
frame  to  another  are  implemented  as  frame  names  that  are 
Lisp  symbols,  rather  than  as  pointers  to  a  frame  data  struc¬ 
ture  (a  CLOS  object)  as  in  LOOM.  LiSP  symbols  are  of 
course  implemented  as  pointers,  but  these  pointers  point 
to  the  Lisp  symbol  table  where  all  symbols  are  interned. 
The  Theo  frame  definition  is  stored  on  the  property'  list  of 
the  symbol.  We  can  consider  LISP  symbol  interning  to  be  a 
stub  mechanism  of  sorts,  because  entries  in  tliis  symbol  ta- 
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Figure  2:  The  relational  schema  used  to  store  LOOM  KBs  in  an  RDBMS,  with  sample  data. 


ble  provide  a  place  to  which  symbolic  frame  references  can 
point.  Therefore,  no  stub  frames  need  to  be  created  for  the 
THEO  storage  subsystem. 

Defining  the  frame  The  storage  system  retrieves  the 
definition  of  F  from  the  Frames  table  of  the  RDBMS,  and 
invokes  LOOM  procedures  to  define  F  based  on  the  retrieved 
definition  string.  If  a  stub  definition  already  existed  for 
F  because  of  a  connection  from  a  previously  faulted  frame 
to  F,  the  stub  object  is  directly  converted  to  a  LOOM  ob¬ 
ject  (Loo.M  is  written  using  CLOS,  which  allows  this  class- 
conversion  operation).  This  approach  maintains  the  validity 
of  all  previously  existing  pointers  to  the  stub. 

Modifications  to  LOOM  We  made  several  modifica¬ 
tions  to  Loom  to  implement  demand  loading  of  frames.  (1) 
Normally,  when  LOOM  converts  an  identifier  to  an  object 
pointer,  it  checks  a  collection  of  hash  tables  to  find  the  ob¬ 
ject,  We  changed  Loom  so  that  if  the  hash  table  lookup 
fails,  a  frame  fault  is  triggered  in  most  cases.  (2)  No  frame 
fault  is  triggered,  however,  if  a  flag  is  set  to  indicate  that 
we  are  already  in  the  process  of  fanlting  in  a  frame.  In  that 
case,  the  implementation  finds  or  creates  a  stub  for  the  ob¬ 
ject,  and  returns  a  pointer  to  the  stub.  The  frame-creation 
rontines  of  LoOM  have  been  altered  to  check  for  a  stub  for 
an  object  before  creating  a  new  object:  if  the  stub  already 
e.xists,  it  i.<5  directly  converted  to  the  new  object.  (3)  Before 
processing  any  instance  or  query,  LOOM  normally  performs 
a  series  of  operations  on  all  concepts  that  have  been  defined 
or  modified  since  the  last  instance  or  query  was  processed,  to 
ensure  that  the  concept  hierarchy  is  properly  formed.  This 
sealing  process  involves  following  all  superconcept  and  sub- 
concept  links.  However,  in  the  context  of  the  storage  sub¬ 
system,  following  subconcept  links  would  cause  many  addi¬ 
tional  concepts  to  be  faulted  into  memory,  even  though  they 
have  not  been  referenced.  We  have  altered  LOOM  to  follow 
links  only  to  subconcepts  already  in  memory. 

Modifications  to  THEO  Theo  was  simpler  to  modify 
because  stubs  are  not  needed  for  ThEO,  and  because  THEO 
perforins  no  sealing  of  class  (concept)  frames.  The  only 
change  iieces.sary  was  to  the  THEO  frame  lookup  procedure; 
if  THEO  does  not  find  that  a  referenced  frame  is  defined  on 
the  expected  property-list  entry,  a  frame  fault  is  triggered. 


4  Experimental  Methods 

To  evaluate  the  storage  subsystem,  we  want  to  test  how  it 
performs  on  a  variety  of  KBs  of  different  types.  Ideally,  we 
would  have  a  series  of  KBs,  each  differing  from  the  others 
in  a  single  aspect,  so  as  to  be  able  to  pinpoint  how  different 
factors  affect  performance.  Unfortunately,  finding  a  set  of 
real  KBs  that  show  such  systematic  differences  is  virtually 
impossible.  For  this  reason,  we  have  developed  a  parameter¬ 
ized  random  KB  generator  (RKBG)  and  concomitant  tools, 
to  generate  KBs  according  to  input  specifications  (but  with 
meaningless  data).  By  altering  one  parameter  at  a  time, 
we  can  conduct  controlled  experiments  with  interpretable 
results. 

Our  random  KB  suite  consists  of  three  tools.  The  KB 
generator  creates  Loom  and  Theo  KBs  with  storage  char¬ 
acteristics  that  the  user  defines.  Asimulated  KB  application 
generates  accesses  and  updates  to  the  frames  of  randomly 
generated  KBs.  Finally,  the  KB  measurer  examines  real 
KBs  to  determine  their  particular  storage  characteristics,  so 
that  we  have  a  realistic  set  of  parameters  to  feed  into  the 
RKBG.  All  three  tools  have  been  implemented  and  tested. 
The  KBs  used  in  our  timing  experiments  were  all  generated 
by  the  RKBG. 

4.1  The  Random  KB  Generator 

Many  attributes  of  a  KB  can  affect  its  storage  character¬ 
istics.  Input  parameters  to  the  RKBG  allow  the  user  to 
determine  many  of  these  attributes.  We  wanted  to  keep  the 
generator  simple,  yet  flexible  enough  to  generate  nonhomo- 
geneous,  realistic-looking  KBs.  For  example,  in  our  imple¬ 
mentation,  different  slots  can  take  different  datatypes  for 
their  fillers;  the  constraints  generated  for  a  slot  will  depend 
on  its  datatype,  just  as  would  be  observed  in  real  KBs.  We 
wanted  to  provide  enough  input  parameters  to  enable  users 
to  create  a  rich  and  varied  assortment  of  KBs,  while  con¬ 
centrating  primarily  on  parameters  we  believed  to  be  most 
relevant  to  the  storage  subsystem  and  its  interactions  with 
Loo.M.  For  example,  the  average  string  length  affects  the 
size  of  a  frame,  and  therefore  the  amount  of  data  that  needs 
to  be  retrieved  during  a  frame  fault,  so  we  believed  it  was 
a  valuable  parameter  to  include,  whereas  the  range  of  in¬ 
teger  values  used  as  slot  fillers  has  little  effect  on  storage 
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processing,  so  we  fixed  it  arbitrarily. 

The  parameters  to  the  RKBG  include  such  properties 
as  number  of  concepts  and  instances,  average  number  of 
slots  pet  frame  and  fillers  pet  slot,  proportions  of  various 
datatypes  for  slot-fillers,  ma.ximuin  depth  of  the  concept  hi¬ 
erarchy,  etc.  In  these  respects,  we  can  make  our  random 
KBs  very  similar  to  real  KBs,  and  the  parameters  for  out 
base  random  KB  are  in  fact  based  on  the  measurements  of 
a  real  Loom  KB.  The  shape  of  the  concept  hierarchy  and 
the  distribution  of  instances  among  concepts  is  entirely  ran¬ 
dom,  however,  so  these  aspects  of  our  generated  KBs  do  not 
necessarily  resemble  actual  KBs. 

4.2  The  Experiments 

The  goal  of  the  experiments  discussed  herein  was  to  mea¬ 
sure  storage  system  performance  as  a  function  of  knowledge 
base  size.  We  therefore  chose  lo  keep  all  RKBG  parame¬ 
ters  constant  except  for  the  number  of  instances  in  the  KB. 
Each  KB  had  100  concepts,  all  primitive,  with  just  one  super 
each.  Instances  averaged  5  slots  apiece,  with  an  average  of 
2  fibers  per  slot.  Half  the  slots  were  filled  by  integers,  with 
the  other  half  filled  by  symbols.  These  parameters  were 
chosen  because  they  appro.ximate  the  characteristics  of  the 
transportation-planning  KB  that  is  driving  our  work  with 
Loom[12],  as  measured  by  our  KB  measurer.  The  same 
random  seed  was  used  to  create  every  KB,  so  the  concept 
hierarchy  remained  the  same,  regardless  of  the  number  of 
instances.  Knowledge  bases  were  generated  with  -500,  1000, 
2000,  4000  and  5000  instances.  For  comparison,  the  same 
set  of  KBs  were  generated  and  saved  to  native  Loom  flat 
files,  to  native  Theo  flat  files,  and  to  the  RDBMS  (both 
Loom  and  THEO  versions).  These  four  variations  of  four 
KBs  form  the  basis  for  our  experiments. 

Experiments  were  run  using  LOOM  2.1,  and  the  February 
1993  version  ofTHEO,  running  on  Lucid  Common  Lisp  4.1.1. 
Both  the  FRS  and  the  RDBMS  server  were  running  on  the 
same  workstation,  a  SP.4RCstation  10  model  41  with  64  MB 
of  physical  memory.  LISP  was  restarted  before  every  trial,  to 
avoid  caching  effects,  and  a  garbage  collection  was  executed 
immediately  before  timing.  Each  trial  was  repeated  three 
times,  and  the  results  averaged.  Overall  elapsed  times  were 
measured  using  the  LisP  time  function.  Measuring  the  time 
spent  in  LOOM,  ThEO,  IDI,  and  the  storage  subsystem  was 
done  by  monitoring  key  procedures  using  tlie  CMU  moni¬ 
toring  package.  The  CPU  time  spent  in  the  RDBMS  server 
process  was  measured  using  the  UNIX  ps  utility  to  observe 
total  CPU  time  before  and  after  each  experiment. 

A  few  experiments  were  also  run  with  the  FRS  and  the 
RDBMS  running  on  different  machines  connected  by  a  net¬ 
work.  These  results  are  not  shown,  but  tliey  were  quite 
similar  to  the  results  from  running  botli  on  tlie  same  ma¬ 
chine.  In  general,  there  was  more  variance  in  the  results 
when  running  on  two  machines,  but  the  best  times  in  that 
configuration  were  roughly  the  same  as  the  times  on  a  single 
machine.  On  a  single  machine,  our  elapsed  times  tend  to  be 
very  close  to  the  sum  of  the  CPU  limes  for  each  component, 
indicating  that  system  overhead  (e.g.  from  virtual  memory 
swapping)  was  not  a  major  factor. 

The  first  set  of  experiments  measured  the  time  required 
to  reference  some  number  of  randomly  chosen  instances  from 
KBs  of  dilTerent  sizes.  Each  reference  faults  in  at  least  one 
frame  from  the  RDBMS  (when  the  parent  classes  of  an  in¬ 
stance  are  not  memory  resident,  tliey  ate  also  faulted  in). 

Selected  results  for  LOOM  and  THEO  are  shown  in  Fig¬ 
ure  3.  Each  of  the  dashed  lines  in  these  graphs  shows  the 


time  required  to  reference  N  instances  in  KBs  of  different 
sizes.  For  example,  the  highest  line  in  each  graph  shows  the 
time  requited  to  reference  2000  instances  from  KBs  contain¬ 
ing  a  total  of  2000,  4000,  and  5000  instances.  Figure  3(a) 
shows  that  for  Theo,  the  time  required  to  reference  500  in¬ 
stances  from  a  KB  containing  4000  total  instances  is  about 
the  same  as  the  time  requited  to  load  that  KB  in  its  entirety 
from  the  flat  files. 

A  second  set  of  measurements  breaks  down  the  total  time 
spent  processing  frame  faults  into  several  components:  the 
time  spent  in  the  RDBMS  server,  the  IDI,  our  storage  sys¬ 
tem,  and  the  FRS  (Loom  and  Theo).  Figure  4  plots  these 
component  times  as  a  function  of  the  number  of  instances 
referenced  for  a  fixed  KB  of  5000  instances.  Figure  4(b) 
shows  how  the  total  time  for  referencing  N  instances  breaks 
down  into  time  spent  in  Loom,  our  storage  subsystem  (SSS), 
IDI,  the  RDBMS,  and  other  processing  (presumably  I/O). 
Figure  4(a)  shows  an  analogous  breakdown  for  Theo. 

The  third  experiment  measured  the  time  required  to 
save  updates  to  some  number  of  randomly  chosen  instances 
from  KBs  of  various  sizes.  To  be  consistent  with  tradi¬ 
tional  Loom  behavior,  updates  are  not  written  as  they  oc¬ 
cur.  Rather,  we  wait  until  the  user  issues  a  command  to 
save  updates,  and  then  all  are  written  at  once  in  a  single 
transaction.  We  varied  the  number  of  frames  updated  be¬ 
tween  10  and  1000.  Selected  results  are  shown  in  Figure  5. 

For  comparison,  we  have  included  the  time  lo  save  KBs  of 
varying  sizes  to  Loom  flat  files  (the  time  is  constant  for  a 
given  KB  regardless  of  the  number  of  frames  updated  in  that 
KB).  KB  save  times  for  Theo  ate  similar,  and  thus  are  not 
shown. 

5  Discussion 

Our  primary  goals  in  performing  these  experiments  are  to 
answer  several  questions.  Does  the  performance  of  our  RDBMS- 
based  storage  subsystem  meet  the  goal  of  linear  time  as  a 
function  of  number  of  frames  referenced  and  number  of  up¬ 
dates  stored?  If  so.  Is  its  speed  fast  enough  to  make  the 
storage  system  usable  in  practice?  And  how  do  the  dilTerent 
components  of  the  storage  subsystem  such  as  the  RDBMS 
server  contribute  to  its  overall  performance? 

Figure  4  demonstrates  that  our  architecture  achieves  the 
linearity  goal:  the  time  spent  loading  frames  is  a  linear  func¬ 
tion  of  the  number  of  frames  referenced. 

Figure  3  lets  us  evaluate  the  relative  merits  of  loading 
frames  from  the  RDBMS  versus  from  flat  files.  The  rela¬ 
tive  merit  differs  for  Loom  versus  for  ThEO  because  LOOM 
takes  significantly  longer  to  load  an  entire  KB  of  K  frames 
than  does  Theo.  The  difference  Is  that  Loom  is  performing 
computations  (classification)  on  the  KB  that  THEO  is  not. 
Because  the  same  amount  of  data  is  transferred  for  each  FRS 
during  incremental  loading  of  N  concepts  from  the  same  KB, 
the  database  costs  are  about  the  same.  Ttierefore  the  ratio 
of  database  costs  to  total  costs  is  higher  for  THEO  than  for 
Loom.  For  Theo,  loading  N  instances  from  the  DBMS  is  8 
times  slower  than  loading  an  entire  KB  of  N  instances  from  a 
flat  file.  But  for  Loom,  loading  N  instances  from  the  DBMS 
is  only  3  times  slower  than  loading  a  KB  of  that  size  from  a 
flat  file.  Therefore  the  performance  of  the  RDBMS  storage 
subsystem  is  on  par  witli  a  flat  file  when  a  user  references  tip 
to  12%  of  the  frames  in  a  KB  in  a  given  session  for  Theo; 
for  Loom  the  user  can  reference  up  to  30%  of  the  frames  for 
equivalent  performance.  We  take  this  result  to  mean  that 
even  for  Theo  the  performance  of  the  storage  subsystem 
is  acceptable  in  practice  given  our  .assumption  that  as  KB 
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Figure  3:  The  solid  line  in  each  graph  shows  the  time  required  to  load  entire  KBs  of  varying  sizes  from  flat  files  for  THEO 
(a)  and  LOOM  (b).  The  drished  tines  show  times  required  to  fault  in  frames  from  the  RDBMS  due  to  references  to  instances 
by  the  application.  Each  dashed  line  shows  the  same  number  of  instance  references  as  a  function  of  KB  size.  All  times  refer 
to  total  elapsed  times.  The  vertical  ordering  of  dashed  lines  in  each  graph  and  in  its  legend  are  the  same. 


size  grows,  users  will  reference  only  a  fraction  of  its  frames 
in  a  given  session.  Note  that  RDBMS  loading  also  has  a 
different  response-time  profile  than  does  flat-file  loading  — 
flat-file  loading  requires  a  long  wait  at  startup  time,  whereas 
demand  loading  hides  loading  waits  across  many  operations. 

Figure  3  shows  that  RDBMS  frame  loading  time  depends 
on  KB  size  when  a  fi.xed  number  of  instances  are  referenced. 
Because  the  parents  of  any  referenced  instances  are  faulted 
in  along  with  the  instances,  a  likely  e.vplanation  is  that  the 
time  to  load  classes  depends  on  the  size  of  the  KB.  When 
a  class  is  faulted  in,  the  names  of  all  its  instances  must  be 
retrieved  from  the  database.  Because  all  of  our  experimen¬ 
tal  KBs  contain  the  same  number  of  classes,  the  number 
of  instances  per  clrtss  increases  iti  proportion  to  KB  size,  re¬ 
quiring  a  greater  amount  of  data  to  be  retrieved  per  class  for 
large  KBs.  VVe  have  no  data  yet  on  how  the  classdnstance 
ratio  depends  on  KB  size  for  real  KBs. 

Figure  5(a)  demonstrates  that,  as  e.xpected,  our  architec¬ 
ture  achieves  the  goal  of  saving  KB  changes  in. time  linear  in 
the  number  of  updates.  Figure  5{b)  shows  that  the  time  to 
save  frames  is  not  dependent  on  the  size  of  the  KB.  Saving 
N  updated  frames  to  the  RDBMS  is  roughly  5  times  slower 
than  saving  an  entire  KB  of  N  frames  to  a  flat  file.  There¬ 
fore,  our  storage  subsystem  is  faster  than  the  flat  file  when 
less  than  20%  of  the  KB  has  been  altered. 

6  Related  Work  on  FRS  Storage  Systems 

KEEconnection  couples  the  Kee  FRS  with  a  relational  DBMS 
[]]  and  the  IDI  couples  Loom  with  a  relational  DBMS  [10]. 


They  are  examples  of  architectures  (c)  and  (d)  in  Figure  1,  in 
which  the  DBMS  and  FRS  are  loosely  coupled  peers.  The 
advantage  of  these  architectures  is  to  allow  existing  infor¬ 
mation  from  a  database  to  be  imported  into  an  AI  envi¬ 
ronment.  The  drawback  is  that  this  architecture  does  not 
transparently  enhance  the  storage  capabiUties  of  Loom  as 
does  our  approach.  Users  of  KEEconnection  (and  of  the 
IDI)  must  define  a  mapping  between  a  class  frame  and  a  ta¬ 
ble  in  the  RDBMS;  KEEconnection  creates  frame  instances 
from  analogously  structured  tuples  stored  in  the  RDBMS, 
and  can  store  instance  frames  out  to  the  DBMS.  But  note 
that  only  slot  values  in  instance  frames  can  be  transferred  to 
the  database  —  class  frames  are  not,  so  this  information  is 
not  persistently  stored  using  database  techniques  and  can¬ 
not  be  accessed  by  multiple  users.  Our  approach  allows  all 
information  in  a  LOOM  KB  to  be  permanently  stored  in  the 

DBMS. 

Groups  at  IBM  and  at  MCC  have  coupled  FRSs  to  OODBMSs. 
Mays  et  ol.  coupled  the  K-REP  system  to  theStatice  OODBMS 
[9],  and  Ballou  et  al.  coupled  the  PROTEUS  FRS  to  the 
Orion  OODBMS  [2].  The  IBM  effort  differs  from  our  ap¬ 
proach  in  that  a  KB  is  read  from  the  OODBMS  in  its  en¬ 
tirety  when  it  is  first  referenced  by  a  K-REP  user,  which  we 
believe  will  be  unacceptably  slow  for  large  KBs. 

Unfortunately,  none  of  these  researchers  have  published 
experimental  investigations  of  alternative  implementation 
choices,  as  we  are  doing.  Without  systematic  e.xperiments 
it  is  impossible  to  evaluate  the  relative  merits  of  the  many 
possible  alternative  architectures. 
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Figure  4:  The  total  elapsed  time  for  referencing  and  faulting  N  instances  into  memory  from  a  KB  of  5000  instances  is  separated 
into  several  components,  (a)  The  vertical  distances  between  lines  represent  time  spent  in  THEO  plus  our  storage  subsystem 
(SSS),  in  the  IDI,  and  in  the  RDBMS  server,  (b)  The  vertical  distances  between  lines  represent  time  spent  in  LOOM,  in  our 
storage  subsystem  (including  stub  creation),  in  the  IDI,  and  in  the  RDBMS  server.  The  solid  line  is  an  elapsed  time,  whereas 
all  of  the  broken  lines  show  CPU  times. 


7  Summary  and  Future  Work 

A  FRS  that  performs  demand  loading  of  referenced  frames, 
combined  with  incremental  saving  of  updated  frames,  will 
scale  to  large  KBs  much  more  gracefully  than  an  FRS  that 
can  only  load  or  save  frames  in  their  entirety.  We  pre- 
setited  an  architecture  for  an  FRS  storage  subsystem  that 
submerges  a  DBMS  within  the  FRS  in  a  manner  that  is 
transparent  to  the  FRS  user.  Our  e.'cperimental  results  with 
a  prototype  implementation  show  that  this  coupling  per¬ 
forms  well  in  practice,  and  that  its  performance  is  linear  in 
the  number  of  frames  referenced  or  updated,  as  required. 

Our  future  work  \vill  evaluate  a  number  of  variations  on 
our  current  architecture,  such  as  different  RDBMS  schemas, 
faulting  multiple  related  frames  into  memory  as  a  unit,  and 
the  use  of  other  types  of  DBMSs,  such  as  object-oriented 
DBM.Ss.  Out  experiments  will  involve  several  real  KBs  in 
addition  to  synthetic  KBs.  We  are  also  investigating  new 
paradigms  of  controlling  multiuser  access  to  shared  KBs. 
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